4 research outputs found
Consistent Feature Construction with Constrained Genetic Programming for Experimental Physics
A good feature representation is a determinant factor to achieve high
performance for many machine learning algorithms in terms of classification.
This is especially true for techniques that do not build complex internal
representations of data (e.g. decision trees, in contrast to deep neural
networks). To transform the feature space, feature construction techniques
build new high-level features from the original ones. Among these techniques,
Genetic Programming is a good candidate to provide interpretable features
required for data analysis in high energy physics. Classically, original
features or higher-level features based on physics first principles are used as
inputs for training. However, physicists would benefit from an automatic and
interpretable feature construction for the classification of particle collision
events.
Our main contribution consists in combining different aspects of Genetic
Programming and applying them to feature construction for experimental physics.
In particular, to be applicable to physics, dimensional consistency is enforced
using grammars.
Results of experiments on three physics datasets show that the constructed
features can bring a significant gain to the classification accuracy. To the
best of our knowledge, it is the first time a method is proposed for
interpretable feature construction with units of measurement, and that experts
in high-energy physics validate the overall approach as well as the
interpretability of the built features.Comment: Accepted in this version to CEC 201
Sim-to-Real Domain Adaptation For High Energy Physics
International audienceParticle physics or High Energy Physics (HEP) studies the elementary constituents of matter and their interactions with each other. Machine Learning (ML) has played an important role in HEP analysis and has proven extremely successful in this area. Usually, the ML algorithms are trained on numerical simulations of the experimental setup and then applied to the real experimental data. However, any discrepancy between the simulation and real data may lead to dramatic consequences concerning the performances of the algorithm on real data. In this paper, we present an application of domain adaptation using a Domain Adversarial Neural Network trained on public HEP data. We demonstrate the success of this approach to achieve sim-to-real transfer and ensure the consistency of the ML algorithms performances on real and simulated HEP datasets
Embedded Constrained Feature Construction for High-Energy Physics Data Classification
International audienceBefore any publication, data analysis of high-energy physics experiments must be validated. This validation is granted only if a perfect understanding of the data and the analysis process is demonstrated. Therefore, physicists prefer using transparent machine learning algorithms whose performances highly rely on the suitability of the provided input features. To transform the feature space, feature construction aims at automatically generating new relevant features. Whereas most of previous works in this area perform the feature construction prior to the model training, we propose here a general framework to embed a feature construction technique adapted to the constraints of high-energy physics in the induction of tree-based models. Experiments on two high-energy physics datasets confirm that a significant gain is obtained on the classification scores, while limiting the number of built features. Since the features are built to be interpretable, the whole model is transparent and readable